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NOVEL OLIGONUCLEOTIDE ARRAYS AND THEIR USE FOR SORTING 
ISOLATING, SEQUENCING, AND MANIPULATING NUCLEIC ACIDS ' 



Field of the Invention 



This invention is in the field of sorting, isolating, 
sequencing, and manipulating nucleic acids. 



Background of the Invention 



Ordered arrays of oligonucleotides ("ollgo.-, immobilized on 
a solid support have been proposed for sequencing DNA fragments. 
It has been recognized that hybridization of a cloned single- 
stranded DNA fragment to all possible oligo probes of a- given 
length can identify the corresponding, complementary oligo 
segments that are present somewhere in the fragment, and that 
this information can sometimes be used to determine the DNA 
sequence. Use of arrays can greatly facilitate the surveying of 
a DNA fragment's oligo segments. 

In an oligonucleotide array each oligo probe is immobilized 
on a solid support at a different predetermined position. The 
array allows one to simultaneously survey all the oligo segments 
in a DNA fragment strand. Many copies of the strand are 
required, of course, ideally, surveying is carried out under 
conditions to ensure that only perfectly matched hybrids will 
form. Oligo segments present in the strand can be identified by 
determining those positions in the array where hybridization 
occurs. The nucleotide sequence of the DNA sometimes can be 
ascertained by ordering the identified oligo segments in an 
overlapping fashion. For every identified oligo segment, there 
must be another oligo segment whose sequence overlaps it by all 
but one nucleotide. The entire sequence of the DNA strand can be 
represented by a series of overlapping oligos, each of equal 
length, and each located one nucleotide further along the 
sequence. As long as every overlap is unique, all of the iden- 
tify oligos can be assembled into a contiguous sequence block. 

There is an important limitation to sequencing by known 
surveying techniques. As relatively longer DNA strands are 
surveyed, there is an increasing probability that more than two • 
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identified oligos will share the same overlapping sequence, i.e., 
the overlap is not unique. When this occurs r the sequence of the 
DNA cannot be unambiguously determined* Instead of one con- 
tiguous sequence block that contains the entire DNA sequence, the 
oligos can only be assembled into a number of smaller sequence 
blocks; whose order is not known. 

Summary of the Invention 

We have invented new oligonucleotide arrays and methods of 
using them, 

A "binary array" according to the invention contains 
immobilized oligos comprised of two sequence segments of prede- 
termined length, one variable and the other constant. The 
constant segment is the same in every oligo of the array. The 
variable segments can vary both in sequence and length. Binary 
arrays have advantages compared with ordinary arrays: (1) they 
can be used to sort strands according to their terminal sequen- 
ces, so that each strand binds to a fixed location (an address) 
within the array; (2) longer oligos can be used on an array of a 
given size, thereby increasing the selectivity of hybridization; 
this allows strands to be sorted according to the identity of 
internal oligo segments adjacent to a particular constant 
sequence (such as a segment adjacent to a recognition site for a 
particular restriction endonuclease) , and this allows strands to 
be surveyed for the presence of signature oligos that contain a 
constant segment in addition to a variable segment; (3) universal 
sequences, such as priming sites, can be introduced into the 
termini of sorted strands using the binary arrays, thereby 
enabling the strands 1 specific amplification without synthesizing 
primers specific for each strand, and without knowledge of each < 
strand's terminal sequences; and (4) the specificity of hybrid- 
ization during surveying can be increased by coupling hybridiza- 
tion to a ligation event that discriminates against terminal 
basepair mismatches. 

A "sectioned array" as used herein is one divided into 
sections, so that every individual area is mechanically separated 
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from all other areas, such as, for example, a depression on the 
surface, or a "well". The areas have different oligos immobi- 
lized thereon. A sectioned array allows many reactions to be 
performed simultaneously, both on the surface of the solid 
support and in solution, without mixing the products of different 
reactions. The reactions occurring in different wells are highly 
specific due to the nucleotide sequence of the immobilized oligo. 
A large number of sortings and manipulations of nucleic acids can 
be carried out in parallel, by amplifying or modifying only those 
nucleic acids in each well that are perfectly hybridized to the 
immobilized oligos. Nucleic acids prepared on a sectioned array 
can be transferred .to other arrays (replicated) by direct blot- 
ting of the wells 1 contents (printing), without mixing the 
contents of different wells of the same array. Furthermore, the 
presence of individual sections in arrays allows multiple re- 
hybridizations of bound nucleic acids to be performed, resulting 
in a significant increase in hybridization specificity. It is 
particularly advantageous according to this invention to use a . 
binary array that is sectioned. 

Our invention includes methods of using sectioned arrays to 
sort mixtures of nucleic acid strands, either RNA or DNA* As 
used herein, "strand" means not just a single strand, but multi- 
ple copies thereof; and "mixture of strands** means a mixture of 
copies of different strands no matter how many copies of each sure 
present. Similarly "fragment" refers to multiple copies thereof, 
and "mixture of fragments" means a mixture of copies of different 
fragments. The methods include sorting strands either according 
to their terminal oligo segments (3 9 -terminal or S '-terminal) , or 
according to their internal oligo segments on a binary array. 
Before or after sorting, universal priming region (s) can be added 
to the strands 1 termini to enable amplification. Binary sec- 
tioned arrays for sorting according to strands 9 terminal sequen- 
ces ("terminal sequence sorting arrays") can be comprehensive. A 
"comprehensive array" is one wherein any possible strand will 
hybridize to at least one immobilized oligo. This type of 
sorting is particularly useful for preparing comprehensive 
libraries of fragments of a large genome. For example, in one 
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embodiment of the invention, strands of restriction fragments 
have their restriction sites restored and are sorted on a binary 
array. That array contains immobilized oligos whose constant 
segments contain the sequence complementary to the restriction 
site, and an adjacent variable segment. The array is complete, 
containing all variable sequences of each type in separate areas. 

Our invention also includes using sectioned arrays for 
preparing every possible partial copy of a strand or a group of 
strands. The term "partial 11 refers to multiple copies thereof. 
Partials sure prepared by either of the following methods: (1) 
terminal sorting on a binary sectioned array of a mixture of all 
possible partial strands generated by random degradation of a 
parental strand; or (2) generation of partials directly on an 
array, through the sorting on an ordinary sectioned array of 
parental strands according to the identity of their internal 
oligo sequences, followed by the synthesis of partial copies of 
each parental strand by enzymatic extension of the immobilized 
oligos utilizing the hybridized parental strands as templates. 
In either case, generated partials correspond to a parental 
strand whose 3' or 5" end is truncated to all possible extents 
(at the "variable" end of the partial) , and whose other end is 
preserved (at the "fixed" end of the partial) . These are "one- 
sided partials." Unless otherwise indicated the word "partial" 
is used herein to refer to one-sided partials. 

Our invention also includes methods of using oligo arrays to 
obtain oligo information as part of a process for determining the 
nucleotide sequence of a long nucleic acid strand, or of many 
nucleic acid strands in an unknown mixture. A complete set of 
one-sided partials of v,he strand or strands is prepared on a 
sectioned array, and the cligo content of the partial strands in 
each well of the array is separately surveyed (i.e. each group of 
partials sharing the same oligo at the partials 1 variable end is 
surveyed) . 

Our invention also includes methods of using oligo arrays 
for ordering previously sequenced fragments from a first restric- 
tion digest of a large nucleic acid or even a genome. 
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Our invention also includes methods of using oligo arrays 
for allocating sequenced and ordered allelic fragments into their 
chromosomal linkage groups. 

Our invention also includes a method of using binary arrays 
for surveying the oligos contained in strands or their partials. 
This method provides improved comprehensive surveys over the 
conventional surveying of oligos on an ordinary array. 

Brief Description of the Drawings 

Figure l shows a binary array. 

Figure la shoys an oligo immobilized in an area of a binary 
array. 

Figure 2 shows a sectioned array having depressions. 

Figure 2a shows a well of a sectioned array. 

Figure 3 shows addition of a lattice to a support to make a 
sectioned array. 

Figure 4 shows an example of sorting and amplification of 
restriction fragments on a sectioned binary array. 

Figure 5 shows an example of preparing partials on a sec- 
tioned ordinary array. 

Figure 6 shows, schematically, the order of steps for 
sequencing a complete genome. 

Figure 7 shows, schematically, the use of a sheet with a 
number of miniature survey arrays for simultaneous surveying 
every well in a partialing array. 

Figures 8 to 11 show examples of the determination of 
nucleotide sequences from indexed address sets obtained from 
analysis of mixtures of strands. 

Detailed Description of the Invention 

I. Oligonucleotide arrays 

As used herein an "oligonucleotide array" is an array of 
regularly situated areas on a solid support wherein different 
oligos are immobilized, typically by covalent linkage. Each area 
contains a different oligo whose location is predetermined. 
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Arrays can be classified by the composition of their 
immobilized oligos. "Ordinary arrays" contain oligos comprised 
entirely of "variable segments". Every position of the oligo 
sequence in such a segment can be occupied by any one of the four 
commonly occurring nucleotides. I 

Comprehensive ordinary arrays are those wherein any segment 
of any possible strand will hybridize perfectly to the length of 
one or more immobilized oligos so that no strand is lost. 

Binary arrays differ from ordinary arrays. A binary array 
is illustrated in Figures 1 and la. Figure l shows a substrate 
or support 1 having immobilized thereon an array of oligos 3, 
each oligo being in a separate area 2 of support l. Figure la 
shows one area 2. A binary oligo 3 (many copies, of course) 
comprised of constant region 5 and variable region 6 is cova- 
lently bound to support 1 by covalent linking moiety 4. 

Because of the constant segments, binary arrays provide 
means for the hybridization of longer sequences without increas- 
ing the size of the array. The constant segment can be located 
within the immobilized oligo either "upstream" of the variable 
segment (i.e., toward or at the.5 f end of the oligo) or "down- 
stream" from the variable segment (i.e., toward or at the 3' end 
of the oligo) . The type of array that is chosen depends on the 
specific application. The constant region preferably, is or 
includes a good priming region for amplification of hybridized 
strands by a polymerase chain reaction (PCR) , or a promoter for 
copying the strand by transcription. Generally a length of 15 to 
25 nucleotides is suitable for priming. The constant region can 
contain all or part of the complement of a restriction site. A 
binary array can be "plain" or "sectioned" (see below) . 

"Plain arrays" known in the art are arrays in which the 
individual areas are not physically separated from one another. ^ 
Reactions carried out simultaneously sure limited to those in 
which the nucleic acid templates and the reaction products are 
bound in some manner to the surface of the array to avoid the 
intermixing of products. 

"Sectioned arrays" are divided into sections, so that each 
area is physically separated by mechanical or other means (e.g., 
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a gel) from all the other areas, e.g., depressions on the surf- 
ace, called a "well". There are many techniques apparent to one 
skilled in the art for preventing the exchange of materials 
between areas; any such method can be used to make a "sectioned" 
array, as that term is used herein, even though there might not 
be a physical wall between areas. 

One type of sectioned array is illustrated in Figures 2 and 
2a. Figure 2 shows a support sheet 60 having an array of depres- 
sions or wells 62, each containing many copies of an immobilized 
oligo 64. Figure 2a shows one well 62 of the array of Figure 2. 
Well 62 formed in support 60 has therein oligo 64 covalently 
bound to support 60 by covalent linking moiety 66. In practice 
one may prepare a plain array, e.g., on a flat sheet, and then, 
at a point during a series of steps involving its use, convert 
the array into a sectioned array, e.g., by making physical 
depressions in a deformable solid support to isolate the 
individual areas. The sectioned array can also be created by 
applying a lattice to the solid support and bonding it to the 
surface so that each area is surrounded by impermeable walls. An 
exploded perspective view of such a sectioned array is shown in 
Figure 3. Support or substrate 70, here a planar sheet, has 
mounted thereon and affixed thereto a lattice 72 comprised of a 
series of horizontal members 74, 76. The lattice members define 
a series of open areas which, in conjunction with support 70, 
define an axray of wells 78. In some applications it is prefer- 
able to utilize a detachable lattice (or a removable cover 
sheet) , so that the sectioned array can be converted back to a 
plain array. 

Sectioned arrays according to this invention can be used to 
increase the specificity of hybridization of nucleic acids to the 
immobilized oligos. After hybridization, unhybridized strands 
can be washed away. Hybridized strands can then be released into 
solution without mixing. Released strands can be rebound to the 
immobilized oligos, and unhybridized strands can be washed away. 
Each successive release, rebinding, and washing increases the 
ratio of perfectly matched hybrids to mismatched hybrids. 
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An array can be "3 ,n or ,f 5 ,M . "3 1 arrays" possess free 3* 
termini and "5' arrays" possess free 5 1 termini. The immobilized 
oligos in a 3 1 array can be extended at their 3 1 termini by 
incubation with a nucleic acid polymerase. If it is a template- 
directed polymerase, only immobilized oligos hybridized to a 
template strand can be extended. 

Methods of oligodeoxyribonucleotide synthesis directly on a 
solid support are also known in the art, including methods 
wherein synthesis occurs in the 3' to 5' direction (so that the 
oligos will possess free 5 1 termini). Methods wherein synthesis 
occurs in the 5 1 to 3 1 direction (so that the oligos will possess 
free 3 1 termini) axe also known. 

Suitable substrates or supports for arrays should be non- 
reactive with reagents to be m in processing, washable under 
stringent conditions, not in e -fere with hybridization and not be 
subject to inordinate non-specific binding. For example, treated 
glass polymers of various kinds (e.g., polyamide and polyacromor- 
pholide) , latex-coated substrates and silica chips. 

Arrays can be made over a wide range of sizes. In the 
example of a square sheet, the length of a side can vary from a 
few millimeters to several meters. 

II. Sorting nucleic acids 

Our invention allows mixtures of strands to be sorted 
according either to their terminal oligo segments ("terminal 
sorting") or their internal oligo segments (" internal sorting") 
on a binary array. 

There are two important aspects of our invention for sort- 
ing. First, each strand in a mixture can be made to hybridize at 
only a few, or a single, location. And second, each strand can 
be provided with universal terminal priming regions that enable 
PCR amplication without prior knowledge of the terminal nucleo- 
tide sequences and without the need to synthesize individual 
primers • 

For terminal sorting, the priming region (s) can be made 
essentially dissimilar from the sequences occurring in the 
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adjacent to internal restriction sites of a particular type 
(utilizing a sectioned binary array) . The latter approach is 
useful for ordering sequenced restriction fragments. The sorting 
of strands by their internal segments on a 3' sectioned ordinary 
array is useful for the generation of partial strands by virtue 
of extension of the immobilized oligos. 

Our invention includes the sorting, in particular for 
sequencing, of natural mixtures of RNA molecules, such as 
cellular RNAs. Establishing messenger RNA sequences is useful, 
not only for the identification and localization of genes in the 
genomic DNA, but also for providing information necessary to 
determine the coding gene sequences (i.e. the exon/intron struc- 
ture of each gene) . Furthermore, the analysis of cellular RNAs 
in different tissues, at different stages of development, and in 
the course of a disease, will clarify which genes are active. 
Usually, RNAs are short enough to be sorted and analyzed without 
preliminary fragmentation. 

III. Preparing partial strands of nucleic acids on sectioned 
arrays 

Our invention includes methods of using sectioned arrays for 
preparing all possible partial copies of a strand or a group of 
strands. Preparing complete sets of partials of a strand (s) , and 
sorting the partials by their variable ends is especially useful 
in a process for determining the sequence of the strand or 
strands. The preparation of partials is accomplished by either 
of the following methods: (l) terminally sorting on sectioned 
binary arrays a mixture of partial strands generated by degrada- 
tion of a "parental" strand (s) at random; or (2) generating 
partials on a sectioned ordinary array, through the sorting of a 
parental strand (s) according to the identity of the strand's 
internal sequences, followed by the synthesis of (complementary) 
partial copies of the parental strand (s) by the enzymatic exten- 
sion of the immobilized oligos, utilizing the hybridized parental 
strands as templates, and then copying the immobilized partials. 
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,„., different partials, all sharing the mm 
corresponding number of dirtereni. v 

sequence at their variable ends. are 
in the second case (sorting before parceling) , partials are 
prepared directly £ rom the parental strands that «- 
to a sectioned ordinary array without prior degradation. A 
land or a fixture of strand,, is hybridised to a 3- ordinary 

The immobilized oligos are then used as primers for 
Spying 1 hybridized strand., beginning at the locatio. . within 
..en bound strand where hybridization occurred, and 
upstream terminus of each bound strand. After ^sion of the 
Mobilized oligos, the hybridized parental strands are <U.- 
^lea « this ppint the wells contain immobilized (co.plem.n- 
"rT partial strands. The partials in one well all share a 
"^tLminal oligo seoment that is complementary to a particular . 
internal oligo in the parental strand<s, . The partial 
„ave 3 -terminal sequences that include the complement of the 5 

-1 recr m of the parental strand(s) (which contains a 
£Z ^ onu/e the methods a-^--^-^ 
in, before sorting, the Mobilized complementary partial, will 
contain a priming region at only one end and therefore can not be 
amplified exponentially. However, their lin«r 
possible, with the partials being synthesized as DMAs or pNAs. 
Where BHA partials are generated, the priming region at the 
martiarcopy-s 3- terminus contains an P»A polymerase promoter. 
Synthesis of RHA copies is more efficient than linear synthesis 
oTdHA copies. Alternatively, the synthesized copies can be 
provide! with second priming regions and can then be ^lif led in 
an exponential manner by KB. This approach is illustrated. 

schematically, in Figure 5. • 

Figure 5 illustrates the generation of partial, for one DBA 
parental strand 30 on a 3- sectioned ordinary array First the 
strand 30 (many copies, of course, such as »' 
of sorting array 13, i. hybridized to the parceling array 31, a 
t sectioned ordinary array, containing well 31a. The parental 
strand 30 binds to many different locations within the array, 
dependent on which oligo segments are present in the •*■»-•» 
hybrid 32 is formed in each well at the array that contains an 
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in surveying, strands first can be randomly degraded into 
pieces whose average length slightly exceeds the surveyed length. 
After degradation, each resulting nucleic acid piece is ligated 
to the same type of oligo (i.e., a constant sequence), that 
preferably does not occur anywhere in the internal regions of the 
pieces. For example, the sequence of the added oligo can contain 
the recognition site of a restriction endonuclease that was used 
to digest the DNA prior to fragment sorting. The ligation can be 
carried out in solution prior to hybridization, or after hybridi- 
zation of the pieces to binary immobilized oligos whose constant 
segment is complementary to the oligo to be ligated. Preferably, 
a 3' array is used, having upstream constant segments. The 
immobilized oligos can then be extended with an appropriate DNA 
polymerase, using the hybridized nucleic acid pieces as 
templates. It is preferable that after extension all hybrids 
have the same length. This can be achieved by employing dideoxy- 
nucleotides as substrates for the polymerase, to restrict exten- 

sion to one nucleotide. 

Hybrids can be labeled in both a ligat ion-dependent and an 
extension-dependent manner to increase the specificity of hybrid 
detection. Also, the ligated oligos and the added dideoxy- 
nucleotides can be tagged with different labels, for example, 
fluorescent dyes of different colors. The array is then scanned 
at two different wavelengths, and only those areas that emit 
fluorescence of both colors indicate perfect hybrids. 

survey results can be improved further by hybrid proof- 
reading, by destroying hybrids containing mismatches, and by 
using chemical or enzymatic methods. 

V. Use of the oligonucleotide arrays for the sequencing of 
nucleic acids 

The arrays and methods of this invention can be used to 
determine the nucleotide sequence of nucleic acids, including the 
sequence of an entire genome, whether it is haploid or diploid. 
This embodiment requires neither cloning of fragments nor prelim- 
inary mapping of chromosomes. It is especially significant that 



w 0 93/17126 

PCT/US93/01S52 

-18- 

our method avoids ei • 

approach that ia '^"tensive ^ 

referred •U^u^T^ * search £ !^°~««* 

^ « Ts s^rr^ -lection JX^ 3 - * * 

nucleic acids are «. ^ dis ^ete qroim nUcle i<= 

Beu 13 ection 11 - 

at ely, a coMDlet-o ^ 13a of the so ^- 4 

« Section in -J?' ««*ioned array 3 J f* «** 9r=»P o, 

31 is nt£ntS « »e U 31a „ * teCtabla - 

surveyiLT! y ba • m the ""^ «"y, ^ 

/"i? is performed uin. """"Wiment shown i. „, 

survey arrays m,,..™ With a 43 . *" "sure 

vita the ^ h ™» tinted inT \ * ^ture 

3.par«tely deterge ^ a ~°«»incr to »• °"<ro 

lustrated i„ ^ J", of «*• fragments 

digested „i«, ! *° t,1,wt of Pigure !^ ents fenced as U - 

array 4 TV?T ** -CTT"" -*""«— and 

W) xn intersite 



WO 93/17126 



PCT/US93/01552 



-19- 

segments of sorted fragments from different digests. This is 
done to determine the order of the fragments relative to one 
another without regard to differences between allelic pairs of 
fragments. In the embodiment shown in Figure 6 this surveying is 
performed with printed sheets 47, 48 that have been printed with 
a pattern of miniature arrays 45, 46. 

To allocate the ordered allelic fragments to their respec- 
tive chromosomes in a diploid organism, fragments are linked 
according to their allelic differences. In the embodiment 
illustrated in Figure 6, the strands from selected wells of the 
sorting array 44 are transferred to a selected well of one of a 
series of partialing arrays 49, partials are generated, and the 
partials are surveyed using miniature survey arrays 50 oh printed 
sheets 51. Only the presence .t oligos containing allelic 
differences in the selected par Mais needs to be determined to 
link a pair of allelic fragments to their respective neighboring 
allelic fragments. 

When sorting according to the identity of terminal sequen- 
ces, each strand occupies a particular "address" in the array. 
It is convenient to think of the address as the oligo sequence 
within a strand that directs the DNA strand to hybridize to a 
particular location, i.e., the sequence that is perfectly com- 
plementary to the variable sequence of the oligo immobilized at 
that location. The " address " also identifies the location within 
the array where the DNA binds. 

After sorting, each group of strands is amplified and 
subjected to partialing. Importantly, the isolation of 
individual strands is not necessary, because our method allows 
the nucleotide sequence of each strand in a mixture to be deter- 
mined. In particular, our method allows the sequences of strands 
in a well of the sorting array to be determined, separately from 
mixtures of strands in other wells. In a preferred embodiment, 
the partialing array is comprehensive in order to obtain all 
possible one-sided partials (i.e., a comprehensive array). Each 
group of partials is amplified prior to surveying. Most prefer- 
ably, the amplification is carried out in such a manner that one 
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to the variable segment of an immobilized oligo. The "address- 
also relates to the location within the array where the partial 
strand is found, since the variable segment of the oligo immobi- 
lized in that well is complementary to the oligo at the partial •» 
variable terminus. The "address- also relates to the location 
within the parental strand of a partialis terminal oligo. The 
location of this "address oligo" within a parental strand is 
characterized by an "upstream subset" of oligos that come before 
it in the parental sequence and by a "downstream subset" of 
oligos that come after it. 

Our method of establishing nucleic acid sequences, for 
either a single strand or a group of parental strands sorted by 
their terminal sequences, begins by assembling an "address set" 
for each address in the partialing array. The "address set" is a 
comprehensive list of all oligos in all the parental strands 
which have the address oligo within their nucleotide sequences. 
The "upstream subset" contains all the oligos that occur upstream 
(i.e., towards the 5' end) of the address oligo in parental 
strands that contain the address oligo. The "downstream subset" 
contains all the oligos that occur downstream (i.e., towards the 
3« end) of the address oligo in any parental strands that contain 
the address oligo. Together the two subsets form the "address 
set • n 

The upstream subset of each address can be determined 
directly from the survey of each well of a partialing array and 
consists of a list of all the oligos identified as being present 
in the partial strands in that well. The downstream subset of 
each address can be inferred by examining the upstream subsets of 
all the addresses: the downstream subset of a particular address 
consists of those addresses whose own upstream subset includes 
that particular address oligo. 

The upstream subset and the downstream subset of a par- 
ticular address, taken together, are an "indexed address set". 
If an oligo occurs more than once in a strand, it can occur in 
both the upstream and the downstream subsets of an address. 
Indexed address sets provide the information required to order 
the oligos contained in a strand set, as will be described below. 
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A sequence block can occur either once in a sequence, or 
more than once, and this we determine by examining the block 
sets. If a block occurs more than once in a sequence, it will 
always be contained in both its own upstream and downstream 
subsets. On the other hand, if a block occurs only once in a 
sequence, it may or may not be present in its own upstream or 
downstream subset. But, if a block is absent from either its 
upstream subset, or from its downstream set, that block occurs in 
the strand only once. The relative order of these "unique " 
blocks can be determined by noting which of them occur in the 
upstream subset, and which of them occur in the downstream 
subset, of the others. Once the unique blocks have been ordered 
relative to each other, the gaps between them are filled with 
blocks that may be non-unique. However, not every gap can 
necessarily be filled in with a particular block. There is a 
range of locations within which each non-unique block (or 
presumably-non-unique block) can be present. The range for a 
particular block is determined by noting those blocks that always 
occur upstream of it, and those blocks that always occur down* 
stream of it. A gap can be filled in if, and only if, there is a 
block or a combination of blocks, whose outer ends have n-1 
nucleotide-long perfect sequence overlaps with the ends of the 
blocks that form the gap. Because at least two overlaps, each of 
low probability, must occur simultaneously, it is highly unlikely 
that more than one block, or one combination of blocks, can fill 
a gap. If a particular block occurs many times in a strand r it 
will have to be used to fill every gap it matches. This is why, 
using the method of the invention, it is possible to establish 
the sequence of a strand without measuring how many times an 
oligo occurs in the parti a Is. It is only necessary to determine 
whether an oligo is present or not. 

An important aspect of this invention is the ability to 
sequence a mixture of strands simultaneously. The invention can 
be used for the determination of fragment sequences from an 
entire fragmented and sorted genome. 

If one strand is being sequenced, all address sets deter- 
mined from a partialing array will contain the same oligos that 
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contiguous sequences can be accomplished by identifying each 
fragment's immediate neighbors. One method for obtaining this 
information is to use another restriction enzyme to cleave the 
same DNA at different positions, thus producing a set of frag- 
ments that partially overlap neighboring fragments from the first 
digest, and then to sequence these fragments. However, it is not 
necessary to sequence the fragments in the second restriction 
digest. It is only necessary to uniquely identify overlapping 
segments in the fragments from alternate restriction digests. 
This can be done by surveying "signatures". 

Signatures can be determined by hybridization of fragment 
strands to complementary oligo probes. A signature of a fragment 
may consist of one, two or more oligos, so long as it is unique 
within the sequence analyzed. Neighboring fragments from one 
restriction digest can be determined by looking for their signa- 
tures in overlapping fragments from an alternate digest. 

We have devised a method for identifying neighboring 
restriction fragments among the list of sequenced fragments that 
does not require either cloning or sequencing of overlapping 
fragments. If strands from an alternate digest are sorted, 
complementary strands of the same fragment will hybridize to 
different addresses in the sorting array. Whenever inters it e 
segments from two or more fragments of the first digest are 
present within one fragment of the second digest, then all of 
these segments will be represented in both complementary strands 
of that one fragment, and all will be present wherever those 
strands bind in a sorting array. We identify the segments by 
obtaining their signatures through hybridization to specialized 
binary survey arrays. The signatures of intersite segments that 
occur in one fragment always accompany each other, whereas 
signatures of distant segments travel independently. 

After the fragments from an original (first) restriction 
digest of a long DNA have been sequenced, the same DNA is 
digested with a second (different) restriction endonuclease, the 
termini of the generated fragments are provided with universal 
priming regions (that also restore the recognition sites at the 
termini) , and the strands are sorted according to particular 
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internal sequences, namely, a variable sequence adjacent to the 
recognition site for the first restriction enzyme. The sorting 
array is a sectioned binary array. It contains immobilized 
oligos having a variable sequence as well as an adjacent constant 
sequence that is complementary to the recognition sequence of the 
first restriction endonuclease . The sorted strands are amplified 
by "symmetric" PCR, so that in each well where a strand has been 
bound, copies of the bound strand, as well as complements, are 
generated. In another embodiment, strands can be sorted accord- 
ing to their terminal sequences on an array whose oligos' con- 
stant segments include sequences that are complementary to the 
recognition site of the second restriction enzyme. This alterna- 
tive is not detailed, but it corresponds to the embodiment 
discussed below, but with terminal sorting. 

Each strand that hybridizes to the binary sorting array will 
possess at least two recognition sites for the second restriction 
enzyme (restored at the strand's termini), and at least one 
(internal) recognition site for the first restriction enzyme. 
The segments included between these two types of restriction 
sites ( inters ite segments) comprise the overlaps between the two 
types of restriction fragments, and each intersite segment is 
thus bounded by any two restriction sites of the two types. It 
follows, that each of these segments can be characterized by 
identifying these two restriction sites and variable sequences of 
preselected length within the segment that are immediately 
adjacent to each of the restriction sites. The combination of a 
recognition site (for either the first or the second restriction 
enzyme)' and its adjacent variable oligo we call a "signature 
oligonucleotide". Every intersite segment can be characterized 
by two signature oligos (of either type) that bound that segment. 
The combination of the two signature oligos is defined herein as 
the intersite segment's "signature". 

After strand amplification, the strands in the wells of the 
sorting array are surveyed to identify the signature oligos of 
each of the two types. This is carried out by using two types of 
binary survey arrays. The first has immobilized oligos contain- 
ing a variable oligo segment and a constant segment that is, or 
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includes, an adjacent sequence that is complementary to the 
recognition site for the first restriction endonuclease. The 
immobilized oligos in the second survey array has a variable 
oligo segment of preferably the. same length as the variable 
segment of the first specialized survey array, and a constant 
segment that is, or includes an adjacent sequence that is com- 
plementary to the recognition site for the second restriction 
endonuclease. The constant oligo segments in these arrays can be 
located either upstream or downstream of the variable oligo 
segments, resulting in the surveying of either the downstream or 
the upstream signature oligos in each strand of the intersite 
segments being surveyed. In a preferred embodiment the constant 
oligc segments are upstream, and the immobilized oligos have free 
3< en:;s, so that they can be extended by incubation sith a DMA 
polymerase. From the oligo information that is ob-.ained, the 
sequenced fragments can be ordered relative to one another. 

in our method, the uniqueness of a signature is achieved by 
surveying "half signatures" (signature oligonucleotides) on two 
relatively small survey arrays. If the variable segments in the 
arrays are 8-nucleotide-long, the number of areas in the two 
arrays is approximately 130,000, or approximately 100,000,000 
times smaller than the single array that would be needed for 
detecting the same size signature (28 nucleotides). 

If a diploid genome (such as a human genome) is sequenced, 
the ordered fragments will appear as a string of unlinked pairs 
of allelic fragments. What remains unknown is how the allelic 
fragments in each pair are distributed between the homologous 
(sister) chromosomes that came from each parent. Allocation of 
the allelic fragments to these "chromosomal linkage groups" 
requires knowledge of which fragment in each pair is linked to 
which fragment in a neighboring pair. 

We have developed a method that uses arrays for allocating 
allelic fragments to chromosomes, irrespective of what method was 
used for sequencing and ordering the fragments. The linkage of 
fragments in neighboring pairs can be achieved by sequencing a 
restriction fragment ("spanning fragment") from an alternate 
digest that spans at least one allelic difference in each pair. 
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Therefore, only one address in the partialing array is needed to 
reveal the linkages between two neighboring allelic pairs. Thus, 
65,536 linkages can be determined on a single comprehensive 
partialing array made of variable octanucleotides . With this 
method, only 10 to 20 of these arrays would be needed to complete 
the assembly of an entire diploid human genome that has been 
fragmented by a restriction endonuclease with a hexameric recog- 
nition site. 

Computational methods can be developed to minimize or 
eliminate errors that occur during partialing and surveying, by 
taking advantage of the high redundancy in the data, such 
methods should take into account the following aspects of a 
preferred sequencing procedure: the sequence of every fragment 
is independently determined four times (by virtue of each strand 
and its complement being present at two different addresses in 
the sorting array) ; each strand set is determined in as many 
trials as the number of different oligos in that strand; every 
nucleotide in a strand is represented by as many different oligos 
as the length (of the variable segment) of the immobilized oligos 
in the survey array; the locations where a particular block can 
occur in a sequence are limited by the distribution of the blocks 
among the upstream and downstream subsets of each pertinent 
address; and the edges of a block must be compatible with the 
edges of each gap where that block is inserted. 

Using our genome sequencing method, one can use throughout 
essentially the same technology, i.e., hybridization of oligo 
probes and the amplification of nucleic acids by the polymerase 
chain reaction, both of which are well-studied, common laboratory 
techniques. The entire procedure can be performed by a specially 
designed machine, resulting in huge reductions in time and cost, 
and a marked improvement in the reliability of the data. Many 
arrays could be processed simultaneously on such a machine. The 
machine most preferably should be entirely computer-controlled, 
and the computer should constantly analyze intermediate results. 
As stated above, used arrays can be stored, both to serve as a 
permanent record of the results, and to provide additional 



w O 93/17126 

PCT/US93/0I552 

-30- 

^terial for subseou^ 

Plete nucLot^ 9—1. D« A ptovldes 

S-enetic el.„. nts « intact, function. * ' ° rgani °»- 

to coapar, M „ * ^ «»plete sequencij * 9nsa »M* of 

""•Plold or dip," »• 9«™»es of any soe-? * ^ •»«• 

fencing Bi^^ *• Me<l not only for 0NA , S 

* invention is al " '« 

clinical settH— useful to date™. 

- — latlng KucUlc — ondition,. 

including l! ■*»««. JL » ""^ f « 

f «Won. a " Can »» «rried out iTT **etio„ s 

* «» i-Jwxr^r*, ^ lifted r^, end - . 

introduced t„ ° U9 ° «»* contain, 1 free ter - 

9 .n. sia i t" to In *T pr °" **.. .Ce" " Utation ^ be 
the subs^tutin tr ° dUC ' * ^«-nucle"? ° f 

=™px^; — . the 

«tili 2iftg as a ^ P Use * to svnth . Partxals or 

"»s the nissina reni™ strand or partial »>. 5 

is provided vith a J The fi »<» «n<i of a that 

»rrespondi„r B t i !. PriJ,ln ' "««n that ie dil 

-tent : tr \ r n r: c r io ' 1 ° f *• ^s: 



WO 93/17126 



PCT/US93/01552 



-31- 

PCR. A single array can be used either to mutate many single 
positions in a gene, or to introduce mutations in many genes in 
one procedure. 

Sectioned arrays can also be used for the massively parallel 
testing of the biological effects of the introduced mutations. 
For example, parallel coupled transcription-translation reactions 
can be carried out in the wells of a sectioned array following 
amplification of the mutant strands, it is thus possible to 
determine simultaneous ly, on the same sectioned array, the 
effects of many different amino acid substitutions on the struc- 
ture and function of a protein. 

VII . Examples 

1. Sorting nucleic acids or their fragments on a binary 
oligonucleotide array whose immobilized oligos have free 3 • 
termini, with constant upstream segments — 

This method allows the immobilized oligos to serve as 
primers for copying bound strands, resulting in the formation of 
complementary copies covalently linked to the array. 

1.1. Sorting restriction fragments according to their 
terminal sequences, following the introduction of terminal 
priming regions — 

DNA is digested using a restriction endonuclease. Recogni- 
tion sites for the restriction endonuclease are restored in 
solution by introducing terminal extensions (adaptors) that 
contain a sequence which, together with the restored restriction 
site, form a universal priming region at the 3 1 terminus of every 
strand in the digest. This priming region is later used for 
amplification by pcr. After melting fragments, the strands are 
sorted on a sectioned binary array. A sequence complementary to 
the generated priming region serves as both the constant segment 
of the immobilized oligos and as the primer for PCR amplification 
of the bound strands. 

DNA to be analyzed is first digested substantially com- 
pletely with a chosen restriction endonuclease, and the fragments 
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side triphosphates are used as substrates for extension of the 3' 
termini. 

After the priming regions have been added, the complementary 
strands are melted apart, such as by increasing temperature 
and/or by introducing denaturing agents such as guanidine iso- 
thiocyanate, urea, or formamide. The resulting strands are 
hybridized to a binary sorting array, such as by following, a 
standard protocol for the hybridization of DNA to immobilized 
oligos. Hybridization is performed so that formation of only 
perfectly matched hybrids is promoted. The hybrids have a length 
which is equal to that of the immobilized oligos. The immobi- 
lized oligos are attached to the array at their 5' termini and 
contain constant restriction site segments adjacent to a variable 
segment of predetermined length. Each strand will be bound to 
the array at its 3' terminus. Its location within the array will 
be determined by the identity of the oligo segment that is 
located in the strand immediately upstream from the restored 
restriction site at its 3' end, and that is complementary to the 
variable segment of the immobilized oligo to which it is bound. 
After hybridization and washing away all unbound material, the 
entire array is incubated with a DMA polymerase, such as Taq DMA 
polymerase deoxyribonucleotide 5 • triphosphates or the DMA 
polymerase of bacteriophage T7, and substrates. As a. result, the 
3* end of each immobilized oligo to which a strand is bound will 
be extended to produce a complementary copy of the bound strand. 
The array is vigorously washed. The wells are then filled with a 
solution containing universal primer, an appropriate DMA polymer- 
ase, and the substrates and buffer needed to carry out PCR. The 
array is then sealed, isolating the wells from each other, and 
exponential amplification is carried out, preferably simul- 
taneously, in each well. 

1.2. Sorting restriction fragments according to their 
terminal sequences, with 3 • and 5' terminal priming regions being 
introduced, one before and one after strand sorting 

This procedure consumes larger amounts of enzymes and 
substrates than the procedure described in Example l.i, however, 
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remove all nucleic acids except the immobilized oligos and the 
li gated strands hybridized to them. 

1.3. Sorting RNAs according to their terminal sequences — 
Nature eukaryotic mRNAs share structural features that can 
help in their manipulation using arrays. All have a "cap" 
structure on their 5' end, and most also possess a 3* -terminal 
poly (A) tail, which is attached posttranscriptionally by a 
poly (A) polymerase. Because there are usually no long oligo(A) 
tracts in the internal regions of cellular RNAs, the poly (A) tail 
can serve as a naturally occurring terminal priming sequence in 
sorting. The size. of mRNAs (several thousand nucleotides in 
length) allows them to be amplified and analyzed directly, 
without prior cleavage into fragments. 

There are known methods for preparing essentially undegraded 
total cellular RNA. Total cellular RNA is converted into com- 
plementary DNA (cDNA) using an oligo(dT) primer and a reverse 
transcriptase or Thermus thermophilus DNA polymerase. Then, 
omitting second strand synthesis, single-stranded cDNAs (which 
possess oligo(dT) extensions at their 5 1 end and variable 3 1 
termini) are sorted according to their 3 '-termini on a sectioned 
binary array and are ligated there to pre-hybridized adaptors of 
a predetermined sequence that are complementary to the immobi- 
lized oligos 1 constant sequence, and that introduce into a cDNA 
molecule the 3 1 -terminal priming site. The cDNA is amplified, 
using two primers for PCR: oligo(dT) and an oligo complementary 
to the adaptor. 

2. Preparing partial strands of nucleic acids on oligo- 
nucleotide arrays — 

There are two aspects to this procedure: first, the genera- 
tion of partial strands (partials) , and second, the sorting of 
partials according to their terminal oligo segments. All of the 
embodiments described below axe based on the following principle: 
in generating partials from a strand, one of the original strand 
ends is preserved (it will be referred to as the "fixed" end), 
whereas the other end is truncated to a different extent in the 
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After cleavage of the double-stranded DNA fragments, DNase 
is removed, e.g., by phenol extraction. The (partial) strands 
are then melted apart and are hybridized to a sectioned binary 
array, wherein the immobilized oligos are pre-hybridized with 
shorter complementary 5 • -phosphorylated oligos of a constant 
sequence that cover (mask) the immobilized oligos except for a 
segment that consists of a variable sequence. Hybridization is 
carried out under conditions that favor the formation of per- 
fectly matched hybrids of a length that is equal to the length of 
the unmasked (variable) segment of the immobilized oligo, and 
that minimize the formation of imperfectly matched hybrids. 
After washing away unbound strands, the bound strands are ligated 
to the masking oligos by incubation with a DNA ligase. The 
ligated masking oligos will themselves serve as the second 
(3 '-terminal) priming region of a partial strand. (All the 
partials of a strand will share the same 5' priming sequence that 
had been introduced into the strand before generation of the 
partials) . if restriction fragments are to be part ia led that 
possess some restriction site at their termini and do not possess 
this site internally, it is preferable that the 3' terminal 
priming region added to the partials include that site. This 
increases the specificity of terminal priming during subsequent 
amplification of the partials by PCR. Subsequent extension, 
washing, and amplification steps are as described in Example l.l. 
If the partials are prepared for the purpose of sequence deter- 
mination, asymmetric PCR can be performed. Alternatively, an RNA 
polymerase promoter sequence can be included in one of the two 
primers, and amplified DMA is then transcribed to produce multi- 
ple single-stranded RNA copies of one of the two complementary 
partial strands. 

2.2. Methods employing chemical degradation of DNA — 
These methods are applicable to both double-stranded and 
single-stranded nucleic acids. Chemical degradation is, in most 
cases, essentially random. It can be performed under conditions 
that destroy secondary structure, and the small size of the 
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nated by washing. This can be done either by ligating the 5' end 
of the bound strand to a single-stranded oligoribonucleotide 
adaptor, or by tailing the immobilized partial copy with a 
homopolynucleotide. The entire array is vigorously washed under 
conditions that remove the original full-length strands and 
essentially all other material not covalently bound. Subsequent 
amplification of the immobilized partials can be carried out in 
different ways, dependent on whether it is desired to use linear 
or exponential amplification. 

Exponential copying results in the generation of partials 
and their complements. For a strand to be exponentially ampli- 
fied by PCR, both of its termini should be provided with a 
priming region, preferably different priming regions. The 
immobilized (complementary) partial contains only one (3 1 - 
terminal) priming region, and a complementary copy produced by 
linear copying would also have only one priming region (on its 5' 
end). For RNA copies to have a priming region at their 5' ends, 
the immobilized partial should have been provided with an RNA 
polymerase promoter downstream of its 3* terminal priming region 
using the methods described herein. The second priming region 
that is needed for exponential amplification can be introduced at 
the 3* ends of the complementary copies as follows. 

(a) The 3' termini of RNA copies can then be ligated to 
oligoribonucleotide or oligodeoxyribonucleotide adaptors which 
are phosphorylated at their 5' end and whose 3 1 end is blocked. 
Exponential PCR can be performed by utilizing the two primers 
that correspond to the two priming regions, and then incubating 
with Tth DNA polymerase. 

(b) If the amplified copies are DNA, they can be trans- 
ferred, such as by blotting, (after melting them free of the 
immobilized partial) onto a binary array that is a mirror copy of 
the first array in the arrangement of the variable segments of 
its immobilized oligos. The constant segments of this binary 
array are pre-hybridized to masking oligos whose ligation to the 
3 1 termini of the transferred DNAs (by DNA ligase) results in 
generation of the second priming region to permit exponential 
PCR. 
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3.1. Comprehensive surveys of DNA strands — 
Every oligo present in a strand or in a partial, or in a 
group of strands or partials, is surveyed. If a survey of 
partials is performed in order to establish nucleotide sequences, 
it is preferable that each partial be represented by the same 
sense copies. Thus, there should be only one of the complemen- 
tary strands in a sample or the complementary strands should be 
differentiable, e.g., one strand should produce either no de- 
tectable signal or a weaker signal. This can be accomplished by 
amplifying the partials linearly or by the use of asymmetric PCR. 

DNA strands (or partials) to be surveyed are preferably 
digested with nuclease si under conditions that destabilize DNA 
secondary structure. The digestion conditions are chosen so that 
the DNA pieces produced are as short as possible, but at the same 
time, most are at least one nucleotide longer than the variable 
segment of the oligos immobilized on the binary array. If the 
surveyed strands or partials have been previously sorted and 
amplified on a sectioned array, this degradation procedure can be 
performed simultaneously in each well of that array. Alterna- 
tively, if it is desired to store that array as a master for 
later use, the array can be replicated by blotting onto another 
sectioned array. The DNA is then amplified within the replica 
array by (asymmetric) PCR prior to digestion with nuclease SI. 

After digestion, the nuclease is inactivated by, for ex- 
ample, heating to 100«C, and the DNA pieces are hybridized to an 
array whose immobilized oligos* constant segments are pre- 
hybridized to 5 • -phosphorylated complementary masking oligos. 
Preferably, the constant segment contains a restriction site that 
has been eliminated from the internal regions of the strands 
prior to sorting and is long enough so that its hybrid with the 
masking oligo is preserved during subsequent procedures. 

The array is incubated with DNA ligase to ligate the masking 
oligos to only those hybridized DNA strands (or partials) whose 
3' terminal nucleotide is immediately adjacent to the 5' end of 
the masking oligo, and matches its counterpart in the immobilized 
oligo. DNA ligase is especially sensitive to mismatches at the 
junction site. 
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time, when the conditions ensure the highest selectivity for the 
particular hybrid that forms in that area. Other conditions 
(such as denaturant and/or salt concentration) can also be 
controlled over time. The fluorescence pattern can be recorded 
at predetermined time intervals with a scanning microf luorometer, 
such as an epif luorescence microscope. 

4. Determination of the nucleotide sequences of strands in 
a mixture when each strand possesses at least one oligo that does 
not occur in any other strand in the mixture — 

Figures 8 to 11 depict the determination of the sequences of 
two mixed strands using the methods of the invention. The 
example demonstrates the power of the invention to identify all 
the oligos present in a strand (i.e., its strand set) when it 
possesses at least one oligo that does not occur in any other 
strand in the mixture. In particular, the example demonstrates: 
(a) how the data obtained by surveying the partial strands 
generated from a mixture of strands and sorted by their variable 
termini (i.e., the upstream subset of each address) and the 
inferred downstream subset of each address (which together form 
the indexed address sets) are used to construct the unindexed 
address sets; and (b) how the unindexed address sets are compared 
to each other to identify prime sets. The example also demon** 
strates how the oligos contained in a strand set are assembled 
into the sequence of the strand, even though the primary data is 
obtained from a mixture. In particular, the example demon- 
strates: (a) how oligos in a strand set are assembled into 
sequence blocks; (b) how the contents of the indexed address sets 
are filtered so that only information pertaining to the oligos in 
a particular strand set remains; (c) how this filtered data is 
re-expressed in terms of the sequence blocks that are contained 
in that particular strand; (d) how information in the resulting 
••block sets" is used to identify those blocks that definitely 
occur only once in the strand ("unique blocks") and to identify 
those that can potentially occur more than once; (e) how informa- 
tion in block sets of unique blocks is used to determine the 
relative order of the blocks that occur only once in the strand; 
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upstream subset of the address oligo "CCT". This means that 
oligo "CCT" occurs downstream of oligo "ACC W in at least one 
strand in the mixture. Therefore "CCT H is inferred to be in the 
downstream subset of address set "ACC" . The remaining downstream 
oligos in all of the address sets are similarly inferred. Note 
that an address oligo is a member of its own upstream and down- 
stream subsets. 

After the indexed address sets of all addresses in the 
parental strands have been determined (as shown in Figure 8b) , 
the information is organized into unindexed address sets (Figure 
8c) , having no division into downstream and upstream subsets, but 
merely listing, for each address oligo, those oligos that occur 
in either the upstream or downstream subset (or in both). In 
Figure 8c, the acfcsss oligos (bold letters) are listed verti- 
cally on the left vide of the diagram. Note that the address 
oligo is a member of its own unindexed address set. 

Unindexed address sets are grouped together according to the 
identity of the oligos they contain (Figure 8d) . Unindexed 
address sets that contain an identical set of oligos are grouped 
together. It can be seen that three groups of address sets are 
formed in this example* The groups are identified by the Roman 
numerals (I, II, and III). The address oligos of each group (for 
example, CTA, GTC, and TCC in group II) always occur together in 
a strand and can occur together in more than one strand. 

Each group of identical address sets is then compared to all 
other groups of identical address sets to see if its common 
address set appears to be a prime by seeing whether any other 
address set is a subset of it. For example, in Figure 8d, the 
address set common to group III is not a prime address set, 
because the address set common to group I is a subset of the 
address set common to group III. However, the address set common 
to group I and the address set common to group II appear to be 
prime address sets. 

Each putative prime address set is then tested to see if it 
is a strand set by examining all the address sets that contain 
all of the oligos that are present in it. For example, in Figure 
9a, all the address sets that contain all the oligos present in 
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the putative prime address set common to group I are listed 
together (namely the address sets contained in groups I and III) . 
The address oligos are shown in bold letters on the left side of 
the diagram, and the groups are identified by Roman numerals. 
The address set common to group I is indeed a prime address set 
(and therefore it contains a single strand set) because a list of 
the eleven oligos that are found in every address set in the 
diagram (they are seen as full columns) is identical to the list 
of eleven addresses on the left side of the diagram. Similarly, 
Figure 8b shows why the address set common to group II is also a 
prime set. The twelve oligos common to every address set in the 
diagram are all found in the list of twelve addresses on the left 
side of the diagram. Had either of these putative prime address 
sets not turned out to be a prime set (by the criterion described 
above) , then it would have been identified as a pseudo-prime 
address set, and further analysis would have been required to 
decompose it into its constituent strand sets. 

Once the strand sets in a mixture have been identified, the 
oligos in each strand set can be assembled into the strand 
sequence in a series of steps, as illustrated in Figure 10 (which 
utilizes the strand set determined in Figure 9a) . 

First the oligos in the strand set are assembled into 
sequence blocks. A sequence block contains one or more uniquely 
overlapping oligos. Two oligos of length n, uniquely overlap 
each other if they share an identical sub-sequence that is n-1 
nucleotides long and no other oligos in the same strand set share 
that sub-sequence. For example, for the strand set shown in 
Figure 10a, the oligos "CAT" and "ATG" share the sub -sequence 
"AT" which does not occur in other oligos. These two oligos 
therefore uniquely overlap to form the sequence block "CATG" , as 
shown in Figure 10b. Similarly, oligo "TGG" uniquely overlaps 
oligo "GGT" by the common sub-sequence "GG", and oligo ••GGT" also 
uniquely overlaps (on its other end) oligo "GTA" by the common 
sub-sequence M GT" . Thus, the three oligos ("TGG", "GGT", and 
"GTA") can be maximally overlapped to form sequence block 
"TGGTA". In forming sequence blocks, the following rule is 
adhered to: two oligos can be included in the same block if they 
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are the only oligos in the strand set to possess their common 
sub-sequence. Thus, "ATG" does not uniquely overlap "TGG", 
because the strand set contains a third oligo, W TTG H , that shares 
the common sub-sequence M TG". If, following these rules, an 
oligo does not uniquely overlap any other oligo, then a sequence 
block consists of only that oligo. For example, "TAA" forms its 
own block. Following the above rules, the eleven oligos that 
occur in strand set A can be assembled into four sequence blocks. 

Second, the data contained in the indexed address sets shown 
in Figure 8b are filtered to remove extraneous information that 
does not pertain to strand set A. Figure 10c shows the resulting 
filtered address sets. All address sets whose address oligo is 
not one of the oligos in strand set A are eliminated. In addi- 
tion, all oligos that are not members of strand set A are removed 
from the upstream and downstream subsets of the remaining address 
sets. The resulting filtered address sets are then grouped 
together according to the oligos that are contained in each 
block. For example, the filtered address sets for address oligos.. 
"CAT* and n ATG 11 have been grouped together in Figure 10c because 
these two oligos are contained in sequence block "CATG 11 . In 
Figure 10c, the address oligos found in the same block are 
identified by rectangular boxes. In addition, oligos that occur 
in the same block are grouped together within each upstream and 
downstream subset. 

. Third, the filtered address sets are converted into block 
sets, as shown in Figure lOd. In a block set, the information 
from different address sets is combined. Instead of a different 
horizontal line for each filtered address set that pertains to a 
particular block, the information in all of the address sets that 
pertain to that particular block is combined into a single 
horizontal line. For example, in Figure 9c, five different 
filtered address sets pertain to sequence block "TACCTTG". In 
Figure lOd, these five lines are combined into a single line in 
which the address oligos are replaced by an "address block", 
shown as "TACCTTG" surrounded by a bold box. Similarly, the 
upstream oligos are replaced by upstream blocks, and the down- 
stream oligos are replaced by downstream blocks. In substituting 
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sequence blocks for the upstream (or downstream) oligos that are 
contained in the filtered address sets for a given address block, 
the following rule is adhered to: a sequence block only occurs 
in the upstream subset (or in the downstream subset) of an 
address block, if every oligo that is contained in that address 
block occurs in the upstream (or in the downstream) subset of 
every filtered address set that pertains to that address block. 
For example, sequence block "CATC" occurs in the upstream subset 
of address block "TACCTTG" because oligos "CAT" and "ATG" occur 
in the upstream subset of address oligos "TAC", W ACC W , W CCT", 
«CTT M , and "TTG". 

Often, a sequence block does not occur in its own upstream 
or downstream subset. For example, sequence block "CATG" does 
not occur in the upstream or downstream subset of its own block 
set (i.e., in block set m C&TG n ) , because oligo "ATG" is not 
present in the upstream subset of address set "CAT" and oligo 
"CAT* 9 is not present in the downstream subset of address set 
"ATG" . When a sequence block does not occur in its own upstream 
or downstream subset, this indicates that that sequence block 
occurs only once in the nucleotide sequence of that strand. 
However, a sequence block may occur in both the upstream subset 
and in the downstream subset of its own block set. For example, 
sequence block "TGGTA" occurs in both the upstream subset and in 
the. downstream subset of block set "TGGTA* . When a sequence 
block does occur in its own upstream and downstream subsets, it 
indicates that the sequence block may, but not must, occur more 
than once in the sequence. The presence of more than one paren- 
tal strand in the original mixture can introduce additional 
oligos into the filtered upstream and downstream subsets that can 
cause a block that actually occurs only once in a sequence to 
appear in both the upstream and downstream subsets of its own 
block set. , However, further analysis of the data determines the 
multiplicity of each block in the strand (as described below) , 
thus resolving these uncertainties. For convenience, block sets 
that pertain to blocks that definitely occur only once in the 
sequence are listed together. For example, in Figure lOd, block 
set "CATG" and block set "TACCTTG" are listed together. 
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Fourth, the position of each sequence block relative to the 
other sequence blocks is determined. An examination of the block 
sets that pertain to unique blocks (that definitely occur only 
once in the sequence of the strand) indicates their relative 
positions. For example, in Figure lOd, block set "CATG" indi- 
cates that unique sequence block "TACCTTG" occurs downstream of 
unique sequence block "CATG". This is confirmed by block set 
"TACCTTG", in which unique sequence block "CATG" occurs upstream 
of unique sequence block "TACCTTG". The relative position of the 
two unique sequence blocks is indicated in Figure lOe, where the 
top line to the left of the arrow shows "CATG" upstream (to the 
left) of "TACCTTG". The relative position of the sequence blocks 
that can potentially occur more than once in the nucleotide 
sequence of the strand is determined from their presence or 
absence in the upstream and downstream subsets of other sequence 
blocks. For example, sequence block "TAA" occurs in the down- 
stream subset of block set "CATG" (and does not occur in the 
upstream subset of block set "CATG") . Furthermore, sequence 
block "TAA" also occurs in the downstream . subset of block set 
"TACCTTG" (and not in its upstream subset) . Therefore, sequence 
block "TAA" must occur downstream of both unique sequence blocks 
"CATG" and "TACCTTG". This is indicated in Figure lOe, where the 
bottom line to the left of the arrow shows "TAA" as occurring 
downstream of "CATG" and "TACCTTG". Furthermore, sequence block 
"TGGTA" occurs only in the downstream subset of block set "CATG" . 
Therefore, it must occur downstream of "CATG" in the sequence. 
On the other hand, sequence block "TGGTA" occurs in both the 
upstream and downstream subsets of block set "TACCTTG". This 
indicates that "TGGTA" can potentially occur in the sequence at 
positions both upstream and downstream of unique sequence block 
"TACCTTG". Finally, "TGGTA" only occurs upstream of "TAA". This 
is indicated in Figure lOe, where the bottom line to the left of 
the arrow contains a bracket that shows the range of positions at 
which "TGGTA" can occur, relative to the positions of the other 
sequence blocks. At this point in the analysis, the diagram to 
the left of the arrow in Figure 9c contains all the information 
obtained that pertains to strand set A. 
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in the sequence of strand B (namely, sequence blocks "CTTG" 
"GTCC", and "TACC"). An examination of block set "GTCC" shows 
that "GTCC" occurs upstream of "CTTG" and "TACC". However an 

Z?'?? 0 " ° f bl ° Ck " CTTG " ■ aad an —i-tKm of block set 
TACC indicates that sequence blocks '.CTTG- and "TACC" can both 
occur upstream and downstream of each other, which appears to 
conflict with the observation that these sequence blocks only 
occur once in the sequence of strand B. There is actually no 
conflict. Each of these sequence blocks does indeed occur only 
once it is just that their positions, relative to one another, 
in strand B are obscured by the presence of conflicting informa- 
txon from the relative positions of oligos that occur in strand 

ThlS "*Wty (indicated by the identical positions of 
sequence blocks "CTTG" and "TACC" in the diagram to the left of 
the arrow in Figure lie) is resolved by the remainder of the 
information. The positions of those sequence blocks that can 
potentially occur more than once in the sequence of strand B is 
determined from other block sets. First, the block sets of the 
sequence blocks that definitely occur only once in the sequence 
(namely, block sets -CTTG", "GTCC", and "TACC") are consulted 
The range of positions at which these other sequence blocks can 
occur (relative to the positions of other blocks, is indicated in 
the diagram to the left side of the arrow in Figure lie 

The assembly of the nucleotide sequence of Strand B proceeds 
as follows: "ATG" is upstream of all other blocks. The uniquely 
occurring block immediately downstream of "ATG" is "GTCC" "ATG" 
and "GTCC" cannot be directly joined. However, "ATG" can be 
directly joined to "TGGT", so the correct order is to join "ATG" 

l^f 0 "' ^ t0 j ° in WTGGC " t0 " GTCC "- Either "CTTG" nor 
"TACC" can be directly joined to "GTCC". Three different 
sequence blocks can be used to bridge this gap (namely, "CCT" 
"GTA", and "TGGT"). The only combination of these three sequence 
blocks that can fill this gap is "CCT" alone, which bridges^he 
gap between "GTCC" and "CTTG" . This resolves the ambiguity as to 
the relative positions of "CTTG" and "TACC". "CTTG" is therefore 
upstream of "TACC". "CTTG" cannot be directly joined to "TACC". 
Again, there are three different sequence blocks that can be used 
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5.1. Cleavable primers — 

Amplification of strands and partials following separation 
(or generation) on a sectioned array requires that their ends be 
provided with priming regions. The priming regions can be 
undesirable in subsequent use, such as the making of recombinants 
or site-directed mutants. For some uses it is desirable to 
substitute new priming regions for the old. For those uses, the 
primers used for amplification must first be removed from the 5 9 
ends. 

Where the junction of tha primer and the strand is contained 
within a unique restriction site, the primer can be removed by 
treating a double-stranded version of the strand with a cor- 
responding restriction endonuclease. However, restriction sites 
will often not be present at the jircxions. A solution to this 
problem is to make the primer (or e\e?t only the junction nucleo- 
tide in the primer) chemically different from the rest of the 
strand. The primer in these examples resides at the strand's 5 • 
terminus . 

5.1.1. Cleavage of primers by alkaline hydrolysis or by 
ribonuclease digestion — 

This method is suitable for removal of oligoribonucleotide 
primers, or mixed RNA/DNA primers whose 3* terminal nucleotide 
(which becomes a junction nucleotide upon primer extension) is a 
ribonucleotide. Such primers are incorporated at the 5' end of 
DNA strands or partials during amplification. 

Alkaline hydrolysis cleaves a phosphodiester bond that is on 
the 3' side of a ribonucleotide, and leaves intact a phospho- 
diester bond that is on the 3' side of a deoxyribonucleotide. 
After alkaline hydrolysis, the pH of the reaction mixture is 
returned to a neutral value by the addition of acid, and the 
sample can be used without purification • Primers containing a 
riboadenylate or a riboguanylate residue at their 3 s end can 
effectively be removed from a DNA strand or partial by treatment 
with T 2 ribonuclease. After treatment, the sample is heated to 
100 °C to inactivate the ribonuclease, and can be used without 
purification. In both these cases, the released 5* terminus of 
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terminal sequence of one of the moieties to be li gated, and the 
other complementary to the 5 1 -terminal sequence of the other 
moiety to be li gated. The immobilized oligos can have either 
free 3' or 5' ends. The relevant termini of the moieties to be 
ligated should be deprived of priming regions, but priming 
regions (preferably different) should be preserved at the 
opposite termini to allow amplification of the recombinants. 
After hybridization in an appropriate well, the two nucleic acid 
strands aire ligated to each other utilizing DNA ligase. 
Unligated strands are then washed away. Only ligated strands 
possess two terminal priming regions required for PGR. The 
strands that are to be ligated can be used in a mixture with 
other strands, provided that no other strands have with the same 
oligos at the termini deprived of priming regions. 

Many different strands can be ligated to one particular 
strand (or partial) , to produce many recombinant variations of 
one gene. In that case, one portion of the splint, i.e., the 
immobilized oligo is a constant segment, and the other portion is 
a variable segment, i.e., a binary array is used. The constant 
segment binds to the strand to be included in every recombinant, 
and the variable segment binds to the end of a strand to be fused 
with the invariant strand. 

5.3. Site-directed mutagenesis ~ 

The ability to prepare any partial of a strand according to 
the 'invention provides the opportunity to make nucleotide sub- 
stitutions, deletions and insertions at any chosen position 
within a nucleic acid. Moreover, the use of sectioned arrays 
makes it possible to perform site-directed mutagenesis at a 
number of positions (even at all positions) at once, and in a 
particular embodiment, to determine, within individual wells of 
the array, properties of the encoded mutant proteins. 

Mutations are introduced into a strand by first preparing 
partials having variable ends that correspond to the segment to 
be mutated, that segment preceding the location of the intended 
mutation. Then mutagenic nucleotides or oligos are introduced 
into the variable ends. The mutated partials are then extended 
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We claim: 

1. A binary oligonucleotide array comprising an array of 
predetermined areas on a surface of a solid support, each area 
having therein, covalently linked to said surface, multiple 
copies 6f a binary oligonucleotide of a predetermined sequence, 
said binary oligonucleotide consisting of a constant nucleotide 
sequence adjacent to a variable nucleotide sequence, wherein the 
constant nucleotide sequence is the same for all oligonucleotides 
in the array. 

2. A binary array according to claim 1 wherein the binary 
oligonucleotides consist of deoxyribonucleotides. 

3. A binary array according to claim 1 wherein the binary 
oligonucleotides consist of ribonucleotides. 

4. A binary array according to claim 1 wherein one or more of 
nucleotides of the binary oligonucleotides are modified. 

5. A binary array according to claim 1 wherein one or more of. 
the nucleotides of the binary oligonucleotides are non-standard. 

6. A binary array according to claim 1 wherein the binary 
oligonucleotides are mixed. 

7. A comprehensive binary array according to claim 1 

8. A comprehensive binary array according to claim 7 wherein the 
binary oligonucleotides in each area have variable sequences of 
the same length. 

9. A3 1 binary array according to claim 1. 

10. A 5 • binary array according to claim 1. 
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11. A 3 . binary array according to clai. 9, wherein each 
covalently linked binary oligonucleotide ha, its constat 
sequence adjacent to the 5- end or its variable stance 

12. A S> binary array according to clai. 10, wherein each 
covalently linked binary oligonudeotide has its Instant 
sequence adjacent to the 3- ,f its variable 

A binary array according to clai. 2 wherein all or part of 
the constant nucleotide sequence is comple«„t*ry to a prater- 
mined restriction recognition sequence. P«deter 

^^^T^ according to clai. 1 having an oligo- 

TcT 7 ldiMa ^ SU " Part OI «- »-tant serene* 

which 1, Ugatabl. to the terminus o, an adjacent nuclei ITii 
hybridized to the oligonucleotide. 

15. In an oligonucleotide array having variable-sequence olioo 
nucleotides innobilized in > „„^_ t T , sequence oligo- 

«i < * _™ 1,1 a Predetermined pattern of areas on a 

solid support, the improvement comprising including ^ 
oligonucleotides a constant sequence of predeter_.iL Z^. 

A sectioned binary array according to clai* 1. 
A comprehensive sectioned binary array according to clai. 



16. 

17.. 
16. 



«. A 3. binary oligonucleotide array according to clai. 17 
wherein each covalently linked binary oligonucleotide has its 
variable sequence adjacent to the 5- end of its constant 



19. A 5- binary oligonucleotide array according to clan, 17 
wherein each covalently linked binary oligonucleotide^ Z £ 
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20. A binary oligonucleotide array according to claim 1, wherein 
said constant nucleotide sequence comprises one or more func- 
tional sequences selected from the group consisting of a nucleic 
acid polymerase priming region, an RNA polymerase promoter 
region, and a restriction endonuclease recognition site. 

21. A binary oligonucleotide array according to claim 20, 
wherein said functional sequence is a priming region. 

22. A binary oligonucleotide array according to claim 1, wherein 
each binary oligonucleotide is covalently linked to said surface 
through a long polymer chain. 

23. A binary oligonucleotide according to claim 2, wherein said 
deoxyribonucleotides comprise at least one modified nucleotide. 

24. A sectioned oligonucleotide array comprising an array of 
predetermined areas on a surface of a solid support, each area . 
having therein, covalently linked to said surface multiple copies 
of an oligonucleotide, wherein said areas are physically separ- 
ated from one another into sections, such that nucleic acids in 
an aqueous solution generated in one section cannot migrate to 
another section. 

25. A sectioned oligonucleotide array according to claim 24 
further comprising a lattice attached to said surface. 

26. A sectioned oligonucleotide array according to claim 25, 
wherein said lattice is removably attached to said surface. 

27. A sectioned oligonucleotide array according to claim 25, 
further comprising a cover removably attachable to said lattice. 

28. A sectioned oligonucleotide array according to claim 24, 
wherein said sections comprise wells in said solid support. 
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29. A sectioned oligonucleotide array according to claim 28 
further comprising a cover removably attachable to said solid 
support. 



30. A sectioned oligonucleotide array according to claim 24 
comprising a gel which physically separates said areas by prev- 
enting nucleic acids in an aqueous solution placed in one area 
from migrating to another area. 

31. A sectioned oligonucleotide array according to claim 24 
wherein said sections are mechanically separated from one ' 
another. 

32. A sectioned oligonucleotide array according to claim 27 
wherein said cover comprises a replica array. 

33. A sectioned oligonucleotide array according to claim 29 
wherein said cover comprises a replica array. 

34. A sectioned array according to claim 24 wherein all of the 
oligonucleotides in individual areas are of the same sequence. 

35. A sectioned array according to claim 24 wherein not all - 
oligonucleotides in each area are of the same sequence. 

36. A method of sorting a mixture of nucleic acid strands 
comprising the steps of: 

a) providing a solution containing a mixture of nucleic 
acid strands in single-stranded form and 

b) contacting said solution to a first binary oligo- 
nucleotide array of predetermined areas on a surface of a solid 
support, each area having therein, covalently linked to said 
surface, copies of a binary oligonucleotide, said binary oligo- 
nucleotide consisting of a constant nucleotide sequence adjacent 
to a variable nucleotide sequence, wherein the constant nucleo- 
tide sequence is the same for all oligonucleotides in the arrav 
wherein said step of contacting is carried out under conditions' 
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promoting perfect hybridization of said strands to said binary 
oligonucleotides . 



37. A method according to claim 
prehensive. 

38. A method according to claim 
array. 

39. A method according to claim 
nucleotides are complementary to 
the strands in said mixture. 

40. A method according to claim 
prehensive. 



36 wherein said array is com- 

36 wherein said array is a 3 1 

36 wherein said binary oligo- 
sequences that possibly occur in 

39 wherein said array is com- 



41. A method according to claim 36 wherein said array is a 
sectioned array , further comprising the step of amplifying 
strands hybridized in at least some of said areas to produce 
copies of said hybridized strands. 

42. A method according to claim 36 further comprising removing 
strands that have not perfectly hybridized, 

43. A method according to claim 42 further comprising adding a 
terminal extension to at least one terminus of the strands, said 
terminal extension having a sequence which substantially does not 
occur in the strands. 

44. A method according to claim 43 wherein a terminal extension 
is added to the strands by ligation of hybridized strands to 
masking oligonucleotides, said masking oligonucleotides being 
also hybridized to said binary oligonucleotides, 

45. A method according to claim 44 wherein a second terminal 
extension is added to the strands prior to said step of con- 
tacting, said second terminal extension being added to termini 
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not hybridized to aa^ k- 

of contacting. ^ 0li ^cleotides 

during said S f» n 

46. a method according to clai» « , 

m h ™ Zed S — on a sectWd Lr^nt 

° f mat ^al in said areas anT^ S ° 1Uti ° n Withou * 
>">ary oligonucleotides *>1 ToZ Z ^ ^ t0 Sai * 

strands. oiiowed by removing unhybridized 

47. a method according to 

hybridized stranils to J olut £H* Vr*" "basing 
«• A method according to cl a i„ . 

nucleic . cid strands cLpris.,^ ""^ «" "^ura of 
«• A method according to claim „ 

nucleic acid stran d. J co Dpri « d "IT" **" * 
«ite specific degradation. fra9Mnts Gained by 

SO. A method according to claim '„ .. 

comprised of DMA fragments obL " ^"U" i s 

oligonucleotide contain a the co»n, " 9l ° n oe "inary 

-"•nuclease recogni tion TT " res ™on 

f»in«l extension 2L """^ """ion of the 

es ™ a "cognition site. 

51. A method according to clai» <, , 

=o»plem»tary copies of hybr^^f ~ 

52. A method according to claim s , ,. 

«»y herein each binary o3 ^ , ^ is a 3 . 

«^en=e adjacent to theV ^ TT"" "™ *. variable 
copies are generated using a """^ 3e9UenC£ ' «« the ' 

oligonucleotide as a primer. P ° ly » erM « using the binary 



WO 93/17126 



PCT/US93/01552 



-63- 

53. A method according to claim 51 wherein the array is a 5' 
array wherein each binary oligonucleotide has its variable 
sequence adjacent to the 3 1 end of its constant sequence, and the 
copies are generated using a DNA polymerase using a primer 
hybridized to a 3 f terminal extension of the hybridized strands, 
and the copies are then ligated to the 5 • end of the binary 
oligonucleotides . 

54. A method according to claim 44 further comprising amplifying 
the hybridized strands. 

55. A method according to claim 51 further comprising removing 
the hybridized strands and amplifying the complementary copies of 
the hybridized strands. 

56. A method according to claim 55 wherein the hybridized 
strands have 3 1 and 5' terminal extensions, and the amplification 
is a polymerase chain reaction. 

57. A method according to claim 55 wherein the hybridized 
strands have a terminal extension and the amplif ication is 
linear. . 

58. A method according to claim 36 wherein said step of pro- 
viding comprises digesting genomic DNA with a restriction endo- 
nuclease to create DNA fragments; 

(a) modifying said fragments by adding a first constant 
sequence to their strands • 3 1 termini and a second constant 
sequence to their strands 1 5 f termini to create priming regions 
including restored restriction sites; and 

(b) denaturing the modified fragments to form a mixture of 
single nucleic acid strands. 

59. A method according to claim 58 wherein said array is a 
sectioned, comprehensive array, further comprising the step of 
amplifying strands hybridized in said areas by symmetric PCR. 
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«0. A method according to claim 58 further comprise the «- 
of amplifying said mixture of single nucleic bv 
asymmetric PCR. strands by 

SI. A method according to clai* 36 said 
nucleotide, or portions thereof are complementary TZ^T 
sequences that possibly occur in one end of the^tranS^aid 
nuxture and that are substan^»Ti„ asanas in said 

sequences in the ^^1^1^^ * — 

62 A method according to clai. «i „h. r .in said ± 
sectioned array, further comprising the step of amplify"/ 

strands hybridized in at least some of said arl^T ! 
aB -n f i A , . ot saia areas to produce 

amplified copxes of said single nucleic acid strands. 

S3. A method according to claim 62 wherein said array is a 
comprehensive array. "ray is a 

64. A method according to ciaim « wherein said array is a 3- 



65 A method according to claim 61 wherein said step of provid- 
ing comprise, digesting genomic DMA with a restriction LZ 
nuclease to create DMA fragments, modifying said fragmentby 
addon, a first constant sequence to their strands' 3'terminf t 
create priming regions including restored restriction 
denaturing the modified fragments ^l^Z^T' "* 
nucleic acid str-,*,^. ""tture of single 



66. A method according to claim 61 wh.r„<„ ..... 
ing comprises d< ""erein said step of provid- 

■wg comprises digesting genomic DMA with a rsstrinn™ „ 
nuclease to cr.ate DNA fragments; "Action endo- 

(a) modifying said fragments by adding a fir-.* ^ 
segment to one of their strands- 3- and s' termini t 
pri-ing regions including restored r^L^T, ~ 
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(b) denaturing the modified fragments into a mixture of 
denatured nucleic acid strands each having a priming region only 
at one end* 

67. A method according to claim 66 wherein said first binary 
sorting array is a 3 1 array. 

68. A method according to claim 67 further comprising the steps 
of 

(a) generating an immobilized copy of each strand hybrid- 
ized to the array by incubation with a DNA polymerase using the 
immobilized oligonucleotide as a primer and a hybridized strand 
as a template; and 

(b) washing to remove from the array all materials not 
covalently bound to the array. 

69. A method according to claim 68, wherein said step of modify- 
ing comprises adding a first constant sequence to their strands' 

5 9 termini and wherein said 3 * array contains binary oligo- 
nucleotides to which are hybridized masking oligonucleotides, 
further comprising the steps of 

(a) ligating said masking oligonucleotides to denatured 
nucleic acid strands hybridized to said binary oligonucleotides 
such that their 3 9 termini are immediately adjacent to one of 
said masking oligonucleotides, and 

(b) washing under conditions such that only strands so 
ligated will remain. 

70. A method according to claim 69 wherein said step of adding a 
first constant sequence includes ligation of a double-stranded 
oligodeoxyribonucleotide adaptor. 

71. A method according to claim 69 wherein said step of adding a 
first constant sequence includes ligation of a single-stranded 
oligoribonucleotide. 
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J2. A method according to cia* 

*** comp rises addiflg **° clau » « wherein said sten a , 
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plementary to terminal sequences that possibly occur in either 
the other ends of said denatured nucleic acid strands or the 
complements of said other ends, and that are not complementary to 
internal sequences in the strands in said mixture or their 
complements. 

81. A method according to claim 61 wherein said step of provid- 
ing comprises digesting genomic DNA with a restriction endo- 
nuclease to create DNA fragments, and denaturing said fragments 
into a mixture of denatured nucleic acid strands. 

82. A method according to claim 81 wherein said first binary 
oligonucleotide array is a 3 1 array containing binary oligo- 
nucleotides to which are hybridized masking oligonucleotides, 
further comprising the steps of ligating said masking oligo- 
nucleotides to denatured nucleic acid strands hybridized to said 
binary oligonucleotides such that their 3* termini are immedi- 
ately adjacent to one of said masking oligonucleotides, washing 
under conditions such that only strands so ligated will remain, 
and generating an immobilized copy of each ligated strand by 
incubation with a DNA polymerase. 

83. A method according to claim 82 further comprising the steps 
of adding a constant sequence to the 5' termini of the hybridized 
strands by ligation of a single-stranded oligor ibonucleotide ; 
incubating with a DNA polymerase to extend the immobilized 
copies; washing to remove from the array all materials not 
•;ovaIently bound to the array; and amplifying said washed, 
immobilized copies to produce amplified copies* 

84. A method according to claim 83 wherein said step of amplify- 
ing comprises PCR. 

85. A method according to claim 83 wherein said first sorting 
array is a comprehensive array. 
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86 * A nat *od according to e i • 
A "Kthod according to ~, ■ 

^■^r~ exteMi - ^ vLi": imobi - 

a- copies. WMhe4 ' — mu»* coples to 

w produce ampii- 

88 ' A according to claim 

c 0fflprises ^ 87 .herein sa . d ^ 
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ItJl " eth °» aCCOrding to eU1 " 61 wherein said °»°^° ~w 

strands are RNA strands. 

96 a method according to claim 95 wherein said rna strands are 
euKaryotic mRNA strands, and wherein said step of providing 
comprises removing 5 • -cap structures . 

rp 01 ^ A Ttn: ccordin9 to ciain 95 - — ds lack 

98 A method according to claim 61 wherein said step of provid- 
ing comprises digesting genomic DNA with a restriction end" 
nuclease to create DNA fragments; 

(a) modifying said fragments by adding a first constant 
sequence to their strands* a« t^ur _ constant 

^ stranas 3 termini and a second constant 
sequence to their strands' <?• • *. caac 

4 , ^ strands 5' termini to create priming reaions 

including restored restriction sites; and 

(b) denaturing the mod 
single nucleic acid strands. 



(b, denaturing the modified fragments into a mixture of 

''loin ?**m<t *a « 



»». A method according to oUta 98 wherein the 3- pricing 
regxons are complementary to the s- priming regions. 

100. A method according to claim 99 wherein said array is a 3- 
array, further comprising the step, of 

ized ^' ( -s 9eneratln9 M «W of each strand hybrid- 

x«d to the array hy incubation with a DMA polymerase; an^ 

(b) washing to remove from the array all materials not 
covalently bound to the array. eriais not 

101. A method according to claim 10 „ wherein said array is a 
seethed array, further comprising the step o, a^>li,yi»g 

I"":' in " — areas by » to produce 

amplified copies of each said immobilized copy. 
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102. A method according to claim 101 wherein said array is a 
comprehensive array. 

103. A method according to claim 99 wherein addition of said 
first constant sequence and said second constant sequence 
includes ligation of a double-stranded oligodeoxyribonucleotide 
adaptor to the strands 1 5' termini. 

104* A method according to claim 99 wherein addition of said 
first constant sequence and said second constant sequence 
includes ligation of a single-stranded oligonucleotide to the 
strands 1 5* termini. 

105. A method according to claim 99 wherein addition of said 
first constant sequence and said second constant sequence 
includes enzymatic extension of the strands* 3* termini by the 
synthesis of a homopolyniicleotide tail. 

106. A method according to claim 101 further comprising contact- 
ing said amplified copies from at least one areas of said 3 ' 
array to a second binary array under conditions promoting hybrid- 
ization of said amplified copies to the binary oligonucleotides 
in said second array. 

107. A method according to claim 106 wherein said amplified 
copies are produced by symmetric PCR and wherein said second 
array is a 3" array. 

108. A method according to claim 106 wherein said first array and 
said second array are comprehensive. 

109. The product of a method according to claim 100. 

110. A method of sorting a mixture of nucleic acid strands 
comprising the steps of 

a) providing a solution containing a mixture of nucleic 
acid strands in single stranded form, and 
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b) contacting said solution to an oligonucleotide array of 
predetermined areas on a surface of a solid support, each area 
having therein copies of an immobilized oligonucleotide, the 
nucleotide sequence of immobilized oligonucleotides in separate 
areas being different, wherein said contacting is performed under 
conditions that promote the formation of perfect hybrids. 

111. A method according to claim 110 wherein said array is 
comprehensive . 

112. A method according to claim 110 wherein the array is 
sectioned. 

A method according to claim 110 wherein the immobilized 
o i .^nucleotides are between 6 and 30 nucleotides long. 

114. A method according to claim 110 wherein the array is a 3' 
array. 

115. A method according to claim 110 wherein the array is a 5' 
array. 

116. In a method wherein two nucleic acid strands are ligated to 
each other in order to form a recombinant product, the improve- 
ment comprising hybridizing first nucleic acid strands to 
immobilized oligonucleotides in an oligonucleotide array prior to 
ligation to second nucleic acid strands, said oligonucleotide 
array comprising an array of predetermined areas on a surface of 
a solid support, each area having copies of an oligonucleotide 
immobilized thereon. 



117. A method according to claim 116 wherein the first nucleic 
acid strands have different nucleotide sequences in each area of 
the array. 
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118. A method according to claim lie wherein th* «, 

c™it: rr s to ciaim 116 wherein * — *■ * 

120. A method according to claim us wherein f ho «i - 
i^bilised in each area are of the sLHe^ 

121. A method according to claim lie wherein tho «i • 

consist of the ^ ^ of aJ^^T^ 1 * - 
nucleotides. Mixed deoxyribonucleotides and ribonucleic,! 
modified d.ox^ibonucl.oti.Us, BO;lifled rlio ^~ C ^° s tid ^ 
standard nucleotides. ««=J.eotxdes, and non- 

122. A method according to claim 116 wherein *-h« „ 
acid strands are not also h y bridi,ed ITZT^ZZI ZT 
nucleotides. ^omzea oixgo- 



acid tTT ° CCOrdin5 t0 CUi " 122 the second nucleic 

StrandS " e Str » da - — M- stranded nucleic acids. 

124. A method according to clai* i» wherein the set of double 
stranded nucleic acids has one end adapted for ligation to Munt 

duTd ^ hyteld " ati0 » «* *" «~* nucleic'acid. Z *T 
immobilized oligonucleotides. 

125. A method according to claim n« 

of the first nucleic acid sTZs^ ^ZZ^ *T 
acids contain pricing regions for a^lif J££ ^ ""^ 

126. A method according to claim 125 wheroin ^„n • 

of the first nucleic acids to the sJ^2^Z2 ^ 
»«ase chain reaction amplification is carried out. 
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127- A method according to claim 124 wherein the double stranded 
nucleic acids are ligated to the immobilized oligonucleotide 
using RNA ligase prior to ligation of the first nucleic acid 
strands and the second nucleic acid strands. 

128. A method according to claim 123 wherein the second set of 
nucleic acids is the same in every area of array. 

129. A method according to claim 123 wherein the first nucleic 
strands are hybridized to the immobilized oligonucleotides while 
contained in a mixture of one or more different strands, said 
different strands having terminal sequences different from 
corresponding termini to be ligated of the first nucleic acid 
strands. 

130. A method according to claim 116 wherein both the first 
nucleic acid strands and the second nucleic acid strands are 
hybridized to the immobilized oligonucleotides in the array prior 
to ligation. 

131. A method according to claim 130 wherein both the first and 
second nucleic acid strands contain priming regions at their non- 
ligating termini. 

132. A method according to claim 131 wherein the first and 
second nucleic acid strands are amplified in a polymerase chain 
reaction following ligation. 

133. A method according to claim 130 wherein both the first and 
second nucleic acids are, prior to hybridization to the immobi- 
lized oligonucleotides, contained in mixtures of nucleic acids 
having terminal sequences different from the corresponding 
termini to be ligated of the first nucleic acid strands and the 
second nucleic acid strands. 

134. A method according to claim 36 further comprising sorting 
the hybridized nucleic acid strands or their copies in an area of 
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the first binary array by cont acting them to a second oligo- 
nucleotide array. 

135. A method according to claim 134 wherein the strands or their 
copies are contacted to all areas of the array. 

136. A method according to claim 36 wherein the nucleic acid 
strands are contacted to all areas of a second binary array. 

137. A method according to claim 134 wherein cleavable primers 
are used following said step of contacting for amplification of 
hybridized strands. 

138. A method according to claim 137 further comprising cleaving 
the cleavable primers from the strands and adding new terminal 
extensions. 

139. A method according to claim 134 wherein the contents of an 
area of the first binary array sure contacted with only predeter- 
mined areas of a second binary array. 

140. A method according to claim 36 further wherein contents in 
an area of the binary array are contacted with the corresponding 
area of a replica array. 

141. A method according to claim 134 wherein the second oligo- . 
nucleotide array is a second binary array. 

142. A method for introducing a site directed mutation into a 
nucleic acid strand on an oligonucleotide array using a partial, 
said partial corresponding to a region of the nucleic acid strand 
adjacent to the location of the site directed mutation to be 
introduced, comprising the steps: 

(a) separately ligating said partial to the free terminus 
of a preselected immobilized oligonucleotide in the oligo- 
nucleotide array to obtain a mutated partial, said oligonucleo- 
tide array comprising an array of predetermined areas on the 
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surface of a solid support, each area having therein a pre- 
selected immobilized oligonucleotide, said preselected oligo- 
nucleotide having a sequence adapted to introduce a mutation to 
the partial added to the area; and 

(b) generating, using the mutated partial, a nucleic acid 
containing the mutation. 

143. A method according to claim 142 wherein step b is 
accomplished by 

(a) hybridizing a complementary copy of the mutated partial 
to a template having the complementary sequence of the terminal 
portion of the nucleic acid strand which is not contained in the 
partial; and 

(b) carrying out a polymerase reaction, a ligation reaction 
or both a polymerase reaction and ligation reaction to join the 
remaining region of the nucleic acid strand to the mutated 
partial. 

144. A method for making immobilized partial copies of a nucleic 
acid strand on a 3 ' or 5' oligonucleotide array, comprising the 
steps: 

(a) hybridizing the strand to the array by an oligo- 
nucleotide segment contained in the strand, said array comprising 
predetermined areas on a surface of a solid support, each area 
having therein immobilized oligonucleotides consisting of a 
predetermined variable sequence, said hybridization taking place 
under conditions that promote the formation of perfect hybrids of 
the length of the immobilized oligonucleotide in each area, and 

(b) where tho strand is hybridized to a 3' array, enzymati- 
cally extending the immobilized oligonucleotide using the hybrid- 
ized strand as a template, and where the strand is hybridized to 
a 5' array, hybridizing a primer to a priming region contained in 
the 3* terminus of the hybridized strand, then enzymatically 
extending the primer to form an extension product, then ligating 
the extension product to the immobilized oligonucleotide. 
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145. A method according to claim 144 wherein the strand is 
hybridized to a 3' array , further comprising amplifying the 
immobilized partial copies using a primer or promoter complement 
appropriate to hybridize to a priming region or promoter sequence 
at the immobilized partial copies' 3 1 termini, and an appropriate 
polymerase, 

146. A method according to claim 144 wherein the oligonucleotide 
array is substantially comprehensive. 

147. A method according to claim 146 wherein a substantially 
complete set of immobilized partial copies is generated on the 
array by 

(a) hybridizing the strand to the array by substantially 
all oligonucleotides present in the strand; 

(b) performing step (b) on all hybridized strands. 

148. A method according to claim 146 wherein a substantially 
complete set of amplified partials is generated on a 3 1 array by 

(a) hybridizing the strand to the 3 1 array by substantially 
all oligonucleotides present in the strand; 

(b) performing step (b) on all hybridized strands; and 

(c) amplifying substantially all immobilized partial copies . 
by using a primer or promoter complement appropriate to hybridize 
to a priming region or promoter sequence. At the partial copy's 
fixed terminus, and an appropriate polymerase. 

149. A method according to claim 148 wherein following step (a) 
unhybridized and imperfectly hybridized strand copies are 
removed. 

150. A method according to claim 149 wherein the array is 
sectioned. 

151. A method according to claim 150 wherein the strand is 
contained in a mixture of strands which are subjected to the same 
steps on the array • 
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152. A method according to claim 151 wherein the priming region 
" a ter-xnal extension introduced in ali strands in th. 

153. A method according to claim 149 wherein the priming region 
or promoter is added to the 5- terminus « the nudeic IT 
strand prior to hybridizing the strand to the array. 

nuc;^ th0d a ° COrdin9 t0 ClSiB 150 fUrth « — "1- «>. oligo- 
nucleotide content in an area o £ the array is surveyed. 

155. The product ot a method according to claim 144. 
156- The product ot a Mtnod iccoralng ^ cXala M6 

157 A method according to claim 144 wherein the strand is 

158. a method according to claim 157 further wherein mixtures of 
strands from different areas of the sorting «i • mixtures <* 

arfl K „. . ^ 1:116 sortxng oligonucleotide arrav 

are hyb„di*ed to the 3- or 5- oligonucleotide array. 

a S D revi M T aCC ° r<Un9 *° '"^ 144 toe ««™» acid is 

a previously prepared partial. 

pa^iairoTl^ 0 ^ 9 ^ " 5 fUrth6r C °^ i8i ^ ~ting 

arr^y on T I COP16S frOB ^ ^ ° f ^nucleotide 
array on a second oligonucleotide array. 

161. A method according to claim 145 further comprising sortincx 
partials or their copies from an area of the sor ^ng 

ar . MV • ^ n area of olxgonucleotide 

array according to variable sequences adjacent their fixed ends 
on a binary oligonucleotide array. 
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162. A method of claim 144 further comprising ligating a partial 
or its copy in single stranded or double stranded form to a 
second nucleic acid strand. 

163. A method according to claim 162 wherein the second nucleic 
acid strand is a previously obtained partial. 

164. A method according to claim 145 further wherein a cleavable 
primer, at an end of a partial to be ligated, is used for ampli- 
fication, and further comprising cleaving the primer and then 
ligating the partial to a second nucleic acid strand. 

165. A method according to claim 162 further comprising exponen- 
tially amplifying ligated product using priming regions at non- 
ligated termini. 

166. A method according to claim 165 further wherein the priming 
regions at the non-ligated termini of the ligated product are 
adapted to permit amplification only of the ligated product. 

167. A method according to claim 144 further wherein a partial 
obtained is ligated to an oligonucleotide or to a second nucleic 
acid strand adapted to introduce a site directed mutation, with 
respect to the nucleic acid strand that the partial was generated 
from, at the ligated terminus of the partial. 

168. A method according to claim 167 wherein the oligonucleotide 
is immobilized in a second oligonucleotide array. 

169. A method for sorting partials by their variable termini on a 
binary oligonucleotide array, which partials have been prepared 
by random chemical or enzymatic degradation of one or more 
nucleic acid strands, said binary array comprising an array of 
predetermined areas on a surface of a solid support, each area 
having therein copies of a binary oligonucleotide of a predeter- 
mined sequence, said binary oligonucleotide consisting of a 
constant nucleotide sequence adjacent to a variable nucleotide 
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sequence, said variable nucleotide sequence being at the free end 
of the binary oligonucleotides, said binary oligonucleotide also 
having a complementary masking oligonucleotide hybridized to all 
or a part of the constant nucleotide sequence, including the 
portion of the constant nucleotide sequence adjacent the variable 
nucleotide sequence, comprising the steps of: 

(a) hybridizing the partials to the array by their termini 
under conditions that promote the formation of perfect hybrids; 
and 

(b) ligating the termini of the partials to the masking 
oligonucleotide • 

170. A method for; obtaining information for determining the 
sequence of a nucleu: acid strand comprising 

(a) generating a substantially complete set of partials of 
the nucleic acid strand; and 

(b) for groups of partials, having the same terminal 
variable nucleotide sequence of predetermined length, separately 
determining the presence and sequence of all variable oligo- 
nucleotides of the predetermined length. 

171. In a method for surveying oligonucleotide content of a 
nucleic acid strand as part of a sequencing method wherein the 
strand is hybridized to a comprehensive oligonucleotide array, 
and the presence of hybridized strands in areas of the array is 
detected, the improvement comprising: 

(a) preparing a substantially complete set. of partials of 
the strand prior to surveying; 

(b) sorting the partials by their variable ends on an 
oligonucleotide array, and 

(c) separately surveying oligonucleotide content of each 
group of sorted partials. 

172. A method according to claim 171 wherein the strand is in a 
mixture of strands which are subjected to the same steps. 
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173. A method according to claim 172 wherein the substantially 
complete set of partials is prepared by chemical or enzymatic 
degradation of the strands and the strands are sorted on a binary 
oligonucleotide array, said binary array comprising an array or 
predetermined areas on a surface of a solid support, each area 
having therein copies of an binary oligonucleotides of a pre- 
determined sequence, said binary oligonucleotide consisting of a 
constant nucleotide sequence of predetermined length and nucleo- 
tide sequence adjacent to a variable nucleotide sequence. 

174* A method according to claim 173 wherein said binary oligo- 
nucleotide array comprises a 3 1 array, said immobilized oligo- 
nucleotides consisting of a constant sequence at the 5' terminus 
of a variable sequence. 

175. A method according to claim 172 further comprising 

(a) preparing address sets containing a complete list of 
all oligonucleotides contained in a strand or strands in the 
mixture which share an address oligonucleotide for substantially 
every address in the oligonucleotide array on which the partials 
were sorted; and 

(b) determining whether an address set is a strand set by 
examining whether the address set can be decomposed- into other 
address sets. 

176. A method according to claim 175 further comprising organiz- 
ing the oligonucleotides in a strand set into sequence b locks 
composed of oligonucleotides that uniquely overlap each other, 
and ordering the blocks. 

177. A method of obtaining information to order a set of first 
fragments resulting from digestion of DNA with a first restric- 
tion endonuclease, the nucleotide sequence of said fragments 
having already been determined, comprising 

(a) digesting the DNA with a second restriction endo- 
nuclease to generate a set of second fragments; 
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(b) denaturing the second set of fragments to form a 
mixture of single nucleic acid strands; 

(c) sorting strands on a substantially comprehensive 
oligonucleotide array; 

(d) amplifying the strands to generate both their direct 

and complementary copies; 

(e) surveying the contents of individual areas of the array 
on a first binary survey array, said first binary survey array 
comprising an array of predetermined areas on a surface of a 
solid support, each area having therein, covalently linked to 
said surface, copies of a binary oligonucleotide, said binary 
oligonucleotide having a constant nucleotide sequence which 
contains a sequence complementary to the restriction recognition 
site of the first restriction endonuclease and adjacent to a 
variable sequence; and 

(f) surveying the contents of individual areas of the array 
on a second binary survey array, said second binary survey array 
comprising an array of predetermined areas on a surface of a 
solid support, each area having therein, covalently linked to 
said surface, copies of a second binary oligonucleotide, said 
second binary oligonucleotide having a constant nucleotide 
sequence which contains a sequence complementary to the restric- 
tion recognition site of the second restriction endonuclease and 
adjacent to a variable sequence, 

178. A method according to claim 177 wherein in step c strands 
are hybridized to an array selected from the group consisting of 

(aj a first binary sorting array, said first binary sorting 
array comprising an array of immobilized oligonucleotides having 
a constant nucleotide sequence complementary to the restriction 
recognition site of the first restriction endonuclease, adjacent 
to a variable sequence of predetermined length, the immobilized 
oligonucleotides in an individual area of the first binary 
sorting array having the same sequence, and 

(b) a second binary sorting array, said second binary 
sorting array comprising an array of immobilized oligonucleotides 
having a constant nucleotide sequence complementary to the 
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restriction recognition site of the second restriction endo- 
nuclease, adjacent to a variable sequence of predetermined 
length, the immobilized oligonucleotides in an individual area of 
the second binary sorting array having the same sequence, 

and wherein following hybridization unhybridized and imper- 
fectly hybridized strands are removed. 

179. A method for obtaining information to allocate sequenced and 
ordered fragments from an original restriction digest of DNA from 
sister chromosomes to chromosomal linkage groups comprising 

(a) preparing a partial on an oligonucleotide array from a 
restriction fragment from an alternate restriction digest of the 
DNA, which partial spans first and second allelic differences in 
neighboring pairs of sequenced fragments from the original 
restriction digest; and 

(b) determining the presence of oligonucleotides containing 
the first and second allelic differences in a partial which spans 
the first and second allelic differences. 

180* A method according to claim 179 wherein 

(a) in step b, the restriction fragment from the alternate 
restriction digest is hybridized to the oligonucleotide array by 
an oligonucleotide containing the first allelic difference; and 

(b) the presence of an oligonucleotide containing the 
second allelic difference is determined by hybridizing the 
partial to a complementary second variable nucleotide sequence in 
an oligonucleotide array and then detecting the presence of the 
partial in the corresponding area of the oligonucleotide array. 

181. A method for surveying oligonucleotides in a nucleic acid 
strand comprising 

(a) randomly degrading the strand into pieces, the average 
length of said pieces slightly exceeding the length of oligo- 
nucleotides surveyed; 

(b) ligating the pieces to a ligating oligonucleotide 
complementary to at least a portion of a constant sequence of 
immobilized oligonucleotides in a binary array; 
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(c) hybridizing the pieces to the binary array, said binary 
array having immobilized oligonucleotides in an ordered array 
therein and consisting of a constant sequence adjacent to a 
variable sequence, the immobilized oligonucleotides in an 
individual area of the array having the same sequence; and 

(d) detecting the hybrids formed. 

182. A method according to claim 181 wherein the array is a 3 • 
array having the variable sequence at the 3 1 termini of the 
immobilized oligonucleotides, further comprising, following step 
(c) , extending the immobilized oligonucleotides with a polymerase 
using hybridized pieces as templates. 

183. A method according to claim 182 wherein the strand is a DNA 
strand resulting from a digest with a restriction endonuclease, 
and melting apart of the fragments obtained thereby or a partial 
obtained from said strand, and wherein the constant sequence 
contains the restriction endonuclease recognition site. 

184. A method according to claim 183 wherein dideoxynucleotides 
are used as substrates during extension of the immobilized 
oligonucleotides using a DNA polymerase. 

185. A method according to claim 181 wherein the ligating oligo- 
nucleotide is pre-hybridized to the constant immobilized oligo- 
nucleotide prior to ligation to the pieces. 

186. In a primer dependent polymerase reaction for amplification 
of a nucleic acid in which a primer is hybridized to a template 
strand and extended by incubation with a primer dependent poly- 
merase and nucleotide substrates to generate a complementary copy 
of the template strand; the improvement wherein: 

the primer or a part thereof contains one or more primer 
nucleotides that are chemically different from nucleotide sub- 
strates incorporated in the complementary copy of the template 
during the amplification said chemical difference causing the 
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primer to be cleavable without cleaving the part of the com- 
plementary copy generated during amplification. 

187. A method according to claim 186 further wherein the primer 
is selectively cleaved without cleaving the part of the com- 
plementary copy generated during amplification. 

188 . A method according to 187 wherein the primer or a part 
thereof contains one or more ribonucleotides triphosphates, and 
the substrates used for amplification are deoxyribonucleoside 
triphosphates and the primer is cleaved by a chemical or enzy- 
matic reaction which cleaves nucleic strands immediately 3 1 of 
ribonucleotides but not 3' of deoxyribonucleotides. 

189. A method according to claim 188 wherein the chemical reac- 
tion or enzymatic reaction is selected from the group consisting 
of 

(a) alkaline hydrolysis; 

(b) hydrolysis by a magnesium formamide mixture; and 

(c) ribonuclease digestion. 

190. A method according to claim 188 wherein a ribonucleotide is 
present at the 3 9 terminus of the primer. 

191. A method according to claim 187 wherein said nucleotide 
substrates used for amplification are modified at their alpha 
phosphate groups so that resulting modified phosphodiester bonds 
in the complementary copy generated during amplification is 
resistant to cleavage by a nuclease , said nuclease being chosen 
to be incapable of cleaving said resulting modified phospho- 
diester bonds, further wherein one or more primer phosphodiester 
bonds are not modified to be resistant to said cleavage, and 
wherein said primer is cleaved by treatment with said nuclease. 
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192. A method according to claim 191 wherein said nucleotide 
substrates modified at their alpha phosphate groups are nucleo- 
side alpha-thiophosphates - 

193. A method according to claim 191 wherein the nucleotide 
substrates used for amplification are modified deoxy- 
ribonucleotides . 

194. An array of oligonucleotide arrays comprising a solid 
sheet having a surface and an array comprising a pattern of 
miniaturized oligonucleotide arrays on said surface, each minia- 
turized array comprising an array of predetermined areas on said 
surface, each area having therein, covalently linked to said 
surface, multiple copies of an oligonucleotide of a predetermined 
sequence. 

195. A method according to claim 68 further comprising 

(a) contacting at least one area of said array containing 
the immobilized copies with at least one oligonucleotide probe 
having a predetermined sequence, under conditions promoting 
hybridization of said at least one probe; and 

(b) determining whether or not said at least one probe has 
hybridized to said at least one area. 

196. A method according to claim 144 further comprising 

(a) contacting at least one area of said array containing 
the immobilized partial copies with at least one oligonucleotide 
probe having a predetermined sequence, under conditions promoting 
hybridization of said at least one probe; and 

(b) determining whether or not said at least one probe has 
hybridized to said at least one area. 

197. A method according to claim 170, wherein determining the 
presence and sequence of all variable oligonucleotides comprises 

(a) contacting said substantially complete set of partials 
with a substantially comprehensive set of oligonucleotide probes, 
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each of a predetermined length, under conditions promoting 
hybridization of said probes; and 

(b) determining to which partials each said probe has 
hybridized. 
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